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Abstract 

The data we analyze derives from the observation of numerous cells of the bacterium Es¬ 
cherichia coli {E. coli) growing and dividing. Single cells grow and divide to give birth to two 
daughter cells, that in turn grow and divide. Thus, a colony of cells from a single ancestor is 
structured as a binary genealogical tree. At each node the measured data is the growth rate 
of the bacterium. In this paper, we study two different data sets. One set corresponds to 
small complete trees, whereas the other one corresponds to long specific sub-trees. Our aim 
is to compare both sets. This paper is accessible to post graduate students and readers with 
advanced knowledge in statistics. 

Keywords : Autoregressive models. Binary tree. Branching processes. Dependent data. 
Linear regression models, Tests 


1 Introduction 

In this paper, we study two different data sets 
structured as binary genealogical trees. For 
the statistician, this special structure is hard to 
take into account rigorously because of the in¬ 
tricate dependence structure within a tree. The 
data sets come from two different biological ex¬ 
periments. One set corresponds to small com¬ 
plete trees, whereas the other one corresponds 
to long specific sub-trees. Our aim is to com¬ 
pare both sets, which is especially complicated 
as they have very different tree structures. 


The underlying biological problem concerns the 
growth of the bacterium Escherichia coli (E. 
coli). E. coli is a rod-shaped bacterium with 
constant width and elongating length, hence its 
length (or size) is representative of its biomass 
or volume. Starting from size x at birth, the 
bacterium size grows exponentially fast with 
time at constant rate until its division. More 
specifically, if T is the age of the bacterium at 
division, there exists a constant r, which will 
be called the growth rate, such that the size 
of the bacterium at time 0 < t < T equals 
xe'^*. E. coli reproduces by binary fission, the 
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mother cell giving birth to two virtually identi¬ 
cal daughter cells. Because of this mode of re¬ 
production, the observation of single cells grow¬ 
ing and dividing for several generations pro¬ 
duces data structured as binary genealogical 
trees. Single cells growth rate within such a 
genealogical binary tree is the variable of inter¬ 
est throughout this study. 

From the statistical point of view, the main 
difficulty in treating such data is the depen¬ 
dence structure as a (possibly incomplete) bi¬ 
nary tree. From the biological point of view, the 
main questions of interest are the following. Do 
sister cells, that are genetically identical, have 
the same growth rate? Is there a memory of 
the growth rate between mother and daughter 
cells? Does it also involve the grand-mother or 
higher ancestors? How can it be modeled? 
Although two sister cells are clones with identi¬ 
cal genetic material, asymmetry in E. coli divi¬ 
sion makes sense biologically. E. coli grows and 
reproduces by dividing roughly at its middle. 
Each cell has thus a new pole (created at the 
division of its mother) and an old one (one of 
the two original poles of its mother), see Figure 
l[^in [Stewart et ah, 2005| . The cell that inher¬ 
its the old pole of its mother is called the old 
pole cell, the other one is called the new pole 
cell. It is suspected that both cells inherit dif¬ 
ferent material or material of different quality 
from their mother cell. Therefore, each cell has 
a type: old pole (O) or new pole (N) cell. 

On experimental data, one usually does not 
know the type of the original cell and its two 
daughters at the root of the genealogy, but from 
generation 2 on, the type of each cell is known. 
For further generations, one can associate to 
one cell not only its type, but also the sequence 
of types of its ancestors, see FigureThe orig¬ 
inal ancestor is labelled 1 and the two daugh¬ 
ters of cell n are labelled 2n for the new pole 
one and 2n -I- 1 for the old pole one. Therefore, 
even-labelled cells are type N and odd-labelled 
cells are type O and the whole sequence of types 
of their ancestors can be retrieved from the de¬ 
composition of their label in base 2 (with 0 cod¬ 
ing for N and 1 coding for O). For instance, cell 
number 19 is type NOO which means, it is type 


O, its mother is type O and its grand-mother is 
type N. 



Figure 1: Cell division binary tree with the type 
of each cell 

An interesting question is thus to find out 
whether the respective growth rate of sis¬ 
ter cells are statistically different or not, and 
whether cells that have accumulated old poles 
along the divisions have a slower growth rate. 
The starting point of the present work is 


^Available at: http://journals.plos.org/plosbiology/article/figure/image?size=medium&id=info:doi/10. 
1371/journal.pbio.0030045.gOOl 
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that the latter questions have seemingly op¬ 
posite answers in the biological literature: in 
[Stewart et ah, 2005| , the growth rate of older 
cells is significantly slowed down, whereas in 
[Wang et ah, 201^ it is stable. We provide the 
data sets from both of these papers, and our 
aim is to conduct a new statistical study of 
both data sets to investigate the behavior of the 
growth rate of E. coli and try to decide whether 
both experiments yield contradictory results or 
not. 

This paper is organized as follows. In Section 
we describe in details the two data sets from 
[Stewart et al., 2005] and [Wang et ah, 2012] 
and explain which statistical investigations 
have been conducted on each of them in pa¬ 
pers from the literature. In Section we give 
the results of the new statistical experiments we 
conducted on these data sets. We present our 
conclusion in Section [D 


They studied the average genealogical tree 
and all pairs of sister cells from generation 
8 as if they were independent. More rigor¬ 
ous statistical studies, taking into account the 
dependencies induced by the tree structure, 
have been conducted in [Guyon et al., 2005 


Guyon, 2007 


|de Saporta et al., 2011 


de Saporta et al., 2012[ de Saporta et al., 2014| 
All those papers rely on the assumption of a 
tree-adapted autoregressive structure for the 
growth rate of daughter cells as a function of 
that of their mother, called Bifurcating Au¬ 
toregressive model (BAR). All conclude that 
the asymmetry between the growth rate of sis¬ 
ter cells is statistically significant. 

The data is provided in the file 
data_stewart.txt. Each line corresponds to 
a single cell. There are 22732 observed cells in 
101 trees (some films have multiple trees). The 
recorded values are given in Table 


2 Two tree-structured data 
sets 


We first describe 
[Stewart et al., 2005 


the 

and 


and relevant literature. 


data sets from 
[Wang et al., 2012 


2.1 Data set from Stewart et al. 

The first data set comes from [Stewart et al., 2005[ . 
The authors followed the growth of 94 micro¬ 
colonies of E. coli cells by video-microscopj0 
Each recording starts with a single cell (ran¬ 
domly selected from previous colonies) and 
stops after 7 to 9 generations of new cells. 
From the images, they measured the growth 
rate of 22732 cells in 101 (possibly incomplete) 
genealogical binary trees as shown in Figure 
The type of each cell is also known from gener¬ 
ation 2 on, together with its complete lineage. 

In [Stewart et al., 2005[ , the authors conclude 
that "the old pole is a significant marker 
for multiple phenotypes associated with aging, 
namely, decreased metabolic efficiency (reduced 
growth rate), reduced offspring biomass pro¬ 
duction, and an increased chance of death ". 


column 

data 

1 

tree number 

2 

cell number within tree 

3 

mother cell number 

4 

cell generation within tree 

5 

mother cell generation 

6 

cell growth rate 

7 

mother cell growth rate 

8 

no of consecutive old poles 

9 

no of consecutive new poles 

10 

no cons, old poles for mother cell 

11 

no cons, new poles for mother cell 


Table 1: Recorded data for data set 

data stewart.txt. 


Value — 1 stands for not available. For instance, 
line 100 reads 

1 . 103 . 51 . 6 . 5 . 0.0348970 

0.0368848 3 . 0 . 2 . 0 . 

which means cell 103 from tree 1 is in genera¬ 
tion 6, it has a growth rate of 0.0348970. It is 
an old pole cell and inherited 3 consecutive old 
poles (type NNOOO). Its mother is labelled 51 
(note that 103 = 2x 51-1-1), it belongs to gener¬ 
ation 5 (5 = 6— 1), its growth rate is 0.0368848, 


^For a sample film see: http://journals.plos.org/plosbiology/article/asset?uiiique&id=info:doi/10.1371/ 
j ournal. pbio . 0030045 . svOOl 
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it is an old pole cell which inherited 2 consecu¬ 
tive old poles (type NNOO). The growth rates 
of tree 1 sorted by generation are presented in 
Figure 



Figure 2: Cell growth rates sorted by genera¬ 
tion for Tree 1 in Stewart data set. 


2.2 Data set from Wang et al. 


The second data set is extracted from the 
richer data set |Wang et al., 201^ . The authors 
filmed and measured the growth and division of 
cells trapped in a channel, ensuring that the old 
pole daughter is always selected, see Figure l|^ 
in [Wang et al., 201^ . Only the cell cumulating 
successive old poles is observed, together with 
its sister. It corresponds to the grey cell sub¬ 
tree in Figure Thus, the whole tree is not 
observed, but the observations can go on for a 
very large number of generations (up to 302). 
Unlike in [Stewart et al., 2005 1 , the cumulated 
old pole cells do not exhibit a reduced growth 
rate but a steady state of growth. The authors 
conclude that they have "shown a striking con¬ 
stant growth rate of the mother cells of E. eoli 
and their immediate sister cells for hundreds of 
generations 

The distribution of the interdivision time of E. 
coli has been studied using the data set from 
Wang et al., 2012 in [Doumic et al., 2015| and 


Robert et al., 201^ using a piecewise deter¬ 


ministic Markov process framework. In 
[Robert et al., 20T4 , the question is to deter¬ 
mine which factor triggers division: the age or 
the size of the cell. It has been shown that the 
distribution of a bacterium life-time depending 
solely on its age does not match experimen¬ 
tal data, while a distribution depending on size 
does fit the data. In [Doumic et al., 2015| non- 
parametric statistical inference was also con¬ 
ducted on the experimental data to estimate 
the interdivision time distribution assuming di¬ 
vision is size-dependent. 

To our best knowledge, this data set has not 
been used yet to compare the growth rate of 
sister cells. 

The data provided in the file data_wang.txt. 
Each line corresponds to a single cell. There 
are 45255 observed cells in 224 channels. The 
recorded values are given in Table 


column 

data 

1 

tree number 

2 

cell generation within tree 

3 

mother cell generation 

4 

cell growth rate 

5 

mother cell growth rate 

6 

No of consecutive old poles 

7 

No of consecutive new poles 

8 

No cons, old poles for mother cell 

9 

No cons, new poles for mother cell 


Table 2: Recorded data for data set 

data_wangt.txt. 


We did not include the cell numbers in the 
trees as they grow exponentially and can be 
retrieved from the generation number and the 
type. Value —1 stands for not available. For 
instance, line 100 reads 

1 . 50 . 49 . 0.0337894 

0.0303264 0 . 1 . 49 . 0 . 

which means cell number 2®^ —2 from tree 1 is in 
generation 50, it has a growth rate of 0.0337894. 
It is a new pole cell. Its mother is labelled 
2®° — 1, it belongs to generation 49, its growth 
rate is 0.0303264, it is an old pole cell which 


^Available at: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2902570/figure/Fl/ For a sample film, see: 
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2902570/bin/NlHMS203820-supplement-03.mp4 
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inherited 49 consecutive old poles. Note that 
in this data sets, old pole cells have cumulated 
at least, as many old poles as the rank of their 
generation and new pole cells always have an 
old pole cell mother. The growth rates of tree 1 
sorted by generation are presented in Figure 
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Figure 3: Cell growth rates sorted by genera¬ 
tion for Tree 1 in Wang data set. 


2.3 Comparison of data sets 


The main difficulty for analyzing these data 
sets lies in the special dependence struc¬ 
ture coming from the genealogical trees. 
To take this into account, one may use 
the BAR model from 
Guyon, 2007[ |de Saporta et al., 2011 


de Saporta et al., 2012[ de Saporta et al., 2014| 
Indeed, it has been successfully applied to 
the first data set. However, in the first set, 
one observes (almost) complete short trees, 
whereas on the second one, one observes 
very long comb-like lineages. This struc¬ 
ture does not fit into the admissible obser¬ 
vation framework of |de Saporta et al., 2011 


de Saporta et al., 2012 de Saporta et al., 2014 


because it involves a critical Galton-Watson 
observation tree, where individuals of type O 
always have 2 offspring, and individuals of type 
N always have no offspring. More generally, 
as the observed trees in both sets have a very 
different shape, one cannot run the same sta¬ 
tistical procedure on both sets, making their 


comparison more intricate. Last, but not least, 
as often with biological data, both sets are very 
noisy. A qualitative study may therefore be 
more informative than a quantitative one. 

The rest of this paper presents our new inves¬ 
tigation of both data sets, the main aim being 
to investigate asymmetry and decide whether 
they lead contradictory conclusions or not. 


3 New investigation of data 
sets 

3.1 Preprocessing of raw data, 
Wang data set 

The first difference between both data sets is 
that for Stewart’s data, we directly received the 
growth rate of each cell, whereas for Wang’s 
data, we had access to raw data of cell lengths 
along time and explained and computed the 
growth rates from an exponential regression, as 
presented in the introduction. When the whole 
life of the cell from birth to division is not ob¬ 
served (typically for the first and last cells in a 
given channel), the computation is impossible, 
thus we attributed the value —1. We did the 
same when some recorded length are negative. 
Otherwise, we provide the raw results, includ¬ 
ing possibly negative growth rates. 

We first tried to work on the raw growth rates 
but we have quickly realized that there were too 
many aberrant data. Thus we developed a pre¬ 
processing based of the following observations, 
see Figure]^ 

1. some trees are globally aberrant (b), 

2. some trees are globally good with a 
chaotic ending probably due to filamen- 
tation (c,d), 

3. some trees are globally good with a few 
aberrant measures of growth rates (a). 

This led us to remove aberrant trees and to 
mark cells with an outlying growth rate as aber¬ 
rant (growth rate value set to —1). It appears 
that filamenting cells are automatically marked 
as aberrant by this procedure. Here is our de¬ 
tailed procedure. 
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Steps for preprocessing Wang data 


1. Remove trees smaller that 20 generations. 


2. Remove aberrant trees on a criterion 
based on a comparison between the dis¬ 
tribution of growth rates within this tree 
and the global distribution of growth rate 
for the whole data set: 


(a) compute robust estimates for mean 
m and variability cr of growth 
rates over all remaining trees, using 
R functions meant. ,trim=.05) and 
mad(.); 

(b) for each tree, compute its mean 
growth rate m* (usual mean), and 
remove the tree if \mt — m\ > a. 


3. For each remaining tree: 


(a) compute the median growth rate of 
old pole cells, mo and of new pole 
cell, ttitv; 

(b) mark each old pole cell whose growth 
rate is outside [mo — 3*tT, mo + 3*(T] 
as outlier; 

(c) mark each new pole cell whose 
growth rate is outside [tojv — 3 * 
(7,mN + 3 * a] as outlier. 


3.2 BAR model 
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(d) Tree 169 


Our first idea to compare both sets was to fit a 
BAR model to Wang’s data set, and compare 
with |de Saporta et ah, 2014| where the BAR 
model is fitted to Stewart’s data. It is espe¬ 
cially appealing as the BAR model can account 
for a steady growth rate for the cumulative old 
pole lineage in the long run. 


Figure 4: Growth rate (j/-axis) vs generation 
number (x-axis) of old pole cells, for four trees 
from the Wang data set. 


The first difficulty stems from the special comb¬ 
like structure of Wang’s data trees. As ex- 
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plained in a previous section, it corresponds to 
critical Galton-Watson observation trees, thus 
existing results from the literature cannot be 
applied. However, one can readily use similar 
ideas as in |de Saporta et ah, 2014| to propose 
an estimator with good convergence properties 
(that will not be detailed here). 

Let Xj^k be the growth rate of cell number k in 
tree number j, with the numeration explained 
in the introduction and on Figure The asym¬ 
metric BAR model is an autoregressive model 
defined as follows: Xj^i is arbitrary and for 
fc > 1, one has 


Xj,2k — 0 - 0 + boXj^k + 

Xj^2k+l = 0,1 + biXj^k + £j,2k + l, 

where (sj^k) is a noise sequence and 9 = 
(oo, 6o, ui, 5i) parameters to be estimated. In 
order to take into account possibly missing data 
(in our example, they will mostly correspond to 
deleted aberrant values), we introduce the ob¬ 
servation process (5j,fc) defined by Sj^k = 1 if 
the growth rate of cell k from tree number j is 
available (i.e. not set at —1), Sj^k = 0 other¬ 
wise. The mean-squares estimator of 6, taking 
into account all the data from the m trees up 
to generation n is given by 


On. = 


/ ^0,n \ 
bOjU 
^l,n 

V h.u / 


e=o 


-1 ( ^k'^hibj^hiXj^2ht 

^j,2hibj,ht:Xj^htXj^2hf 
bj^2hi + lbj^hi Xj 2h^ + 1 
\ bj^2hi + lbj^hiXj^hiXj^2he + l / 


with hi = — 1 and where the normalizing 

matrix is given by 


= 

and for i e {0,1} 


5° 

0 


0 

Si 


i=l fcO 








S \bj2hi+ibj^h^Xjfi^ bj2hi+ibj^hi(,Xjji^) 


Note that only the growth rate of cells from 
the comb-like subtree are taken into account, 
as they are the only available data in this case, 
i.e. cells labelled 2" — 1 and 2” — 2 according to 
the numeration described in the introduction. 
It can be shown with similar techniques as 
in |de Saporta et ah, 2014| that under mild as¬ 
sumptions on the noise and observation se¬ 
quences, this estimator is convergent and 
asymptotically normally distributed. We ob¬ 
tain the estimation results given in Table 



Estimation 

95% confidence interval 

^0,n 

0.0304 

[0.0200; 0.0410] 

^0,n 

0.0664 

[-0.4652; 0.5980] 

^l,n 

0.0281 

[0.0178; 0.0385] 

^l,n 

0.0994 

[-0.3194; 0.5182] 


Table 3: Estimated parameters for the BAR 
model, Wang data, n = 302, m = 224. 


The estimated variance of the noise sequence 
is very high (about .5) compared to the mag¬ 
nitude of the data, leading to wide confidence 
intervals. In particular, as 0 belongs to the con¬ 
fidence intervals of &o,n and (refer to Ta¬ 
ble]^ one cannot assert that the autoregressive 
structure is relevant, and we cannot rely on this 
model to test the symmetry of old and new pole 
cells. 

How to deal with the high level of noise is an 
important question for this data set. We tried 
imputation methods for missing values due to 
aberrant marking, but it appeared that this in¬ 
troduced a strong bias in the tests. We observed 
that the analysis was very sensitive to the choice 
of the imputation method, thus we gave up the 
idea and went on working with uncorrected non 
aberrant data. 


3.3 Memory from the mother and 
higher ancestors, Wang data 
set 

For each tree, we selected the old cell branch 
(upmost branch in Figure and we fit an ad¬ 
ditive regression model explaining the growth 
rate of a cell with the one of its mother and the 
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one of its grand mother 

“t“ Pg9n “t“ /^O “f (1) 

where 

• r„ is the growth rate of the n-th gener¬ 
ation cell (X 2 n+i_i with previous nota¬ 
tion), 

• TTin is the growth rate of its mother 

• gn is the growth rate of its grand mother 

• e„ the prediction error. 

The triple (/3o, fim, Pg) depends 
on the tree. The R command is 

ImCrate^ratemo+rategdmo). Histograms of 
p-values for the significance of the mother co¬ 
efficient Pm (a) and for the grand mother coef¬ 
ficient Pg (b) are plotted in Figure 



(b) 


Figure 5: Histogram of p-value for significance 
of the mother coefficient Pm (a) and for the 
grand mother coefficient Pg (b), Wang data set. 

We conclude that the effect of the grand mother 
is not significant. The coefficient Pm is signifi¬ 
cantly positive with a value around 0.3. 


3.4 Comparison of old pole and 
new pole statistics, both sets 

As it is not possible to compare the BAR model 
for both data sets, we turned to more basic 
tools to compare the influence of the mother 
and higher ancestors on the growth rate of a 
given cell. Here again, as both data sets do not 
have the same structure, one cannot run the 
exact same experiments on both sets. Recall 
that asymmetry is already proved rigorously for 
Stewart’s data. 

The authors in [Stewart et ah, 2005| averaged 
and normalized the growth rate data within 
each generation and each tree (combined with 
another indicator of distance to the edge of the 
microcolony) to obtain their Figure show¬ 
ing a linear increase (respectively decrease) for 
the mean normalized growth rate of cells with 
cumulated consecutive new poles (respectively 
cumulated consecutive old poles). Although the 
lower generations contain significantly fewer in¬ 
dividuals than higher generations and cells with 
an identical number of cumulated old/new poles 
can exist within the same genealogical tree, we 
used the same approach to try to find out how 
many new poles it requires to obtain a rejuve¬ 
nated cell (with respect to its growth rate). 

We averaged the growth rates of cells within 
the same generation of the same tree (irrespec¬ 
tively of the edge distance), and normalized the 
growth rate of each cell with the corresponding 
average. Then we computed the mean growth 
rate over all normalized cells that have cumu¬ 
lated n new poles or n old poles (for 1 < n < 7). 
The results are given on Figure]^ (a), circles cor¬ 
respond to cumulated new-pole cells and stars 
to cumulated old-pole cells. This figure cor¬ 
responds to Figure 3 in [Stewart et ah, 2005| . 
Then we compared the mean of all new-pole 
cells which mother cumulated n old poles, and 
old-pole cells which mother cumulated n new 
poles (for 1 < n < 6), see Figure [^(b), circles 
correspond to new-pole cells with cumulated 
old-pole mother and stars to old-pole cells with 
cumulated new-pole mother. The scales of both 
figures are the same to make visual comparison 
easier. The linear regression slope coefficients 


^Available at http://journals.plos.org/plosbiology/article/figure/image?size=large&id=info:doi/10. 
1371/journal.pbio.0030045.g003 
































































are respectively 4.4% for the new pole cells and 
— 1.1% for the old pole ones in Figure (a), 
0.1% for the new pole cells and —0.5% for the 
old pole ones in Figure Kb). 

One can conclude that one new pole is enough 
to forget an accumulation of old poles and sim¬ 
ilarly one old pole is enough to forget an accu¬ 
mulation of new poles. 


1.04 
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0.99 

0.98 

0.97 

0.96 
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0.94 

0.93 
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(b) 


1. Student test for comparison of the mean 
of the growth rate of old pole cells and of 
new pole cells yields a p-value < 10“^®, 
and 1% confidence intervals for mean 
growth rates are: [0.0309,0.0310] for old 
pole cells and [0.0319, 0.0320] for new pole 
cells. 

2. Regarding the daughter mother correla¬ 
tion, we have computed one confidence in¬ 
terval for the overall correlation between 
the growth rate of old pole daughters 
and their mothers’, and another one for 
new pole daughters and their mothers’: 
1% confidence intervals for correlation be¬ 
tween growth rates of new pole daughters 
and that of their mother is [0.085,0.123], 
the same for old pole cells is [0.125,0.160]. 

A significant difference thus holds for the mean 
as well as for the correlation with the mother 
cell for old pole and new pole sister cells. 


TTn I 


(a) New pole cells 




Figure 6: Mean normalized growth rate within 
generations and trees for cells that have cumu¬ 
lated (a) n consecutive new poles (circles) or n 
consecutive old poles (stars) for 1 < n < 7; (b) 
1 new pole after n consecutive old poles (cir¬ 
cles), 1 old pole after n consecutive new poles 
(stars), for 1 < n < 6, Stewart data set. 

As regards Wang’s data, we compared the mean 
growth rate of new pole and old pole cells as well 
as mother-daughter correlation. More specifi¬ 
cally, we found out the following. 


(b) Old pole cells 

Figure 7: Histogram of regression coefficients 
Pm, for new poles cells (a) and for old poles 
cells (b), Wang data set. 

We have also plotted, in Figure a histogram 
of regression coefficients (w.r.t. mother’s 
growth rate) in both cases, corresponding to co¬ 
efficients Pm in Equation Q with Pg set to 0. 
The difference in not clear, but it seems that in 
the case of old poles, the dispersion is smaller. 
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3.5 Stationarity, both sets 


We then investigated the stationarity of the 
growth rate in the data. The two datasets 
correspond to different experimental proce¬ 
dures, therefore creating potential differences 
in the initial physiological state of the cellls. 
In [Stewart et ah, 2005| , the initial cells were 
picked at random from a population growing 
in a liquid medium and then plated on a solid 
medium, where it grew and divided to form mi¬ 
crocolonies. The cells undergo a plating stress 
when placed on the solid medium, which is well 
known by biologists, see e.g. [Rolfe et ah, 201^ 
and jCuny et ah, 2007] . This leads to a tran¬ 
sient phase of reduced growth rates in the first 
generations, see Figures]^ and 



Figure 8: Box plots of growth rates for cells in 
generations 2 to 8, Stewart data set. 



(a) Generation 2 



(b) Generation 5 



(c) Generation 8 


In [Wang et ah, 201^ , on the contrary, the first 
generations of cells were removed, so that only 
a steady state is observed, see Figure [TO] which 
is the counterpart of Figure|^and presents box- 
plots of the growth rates of cells for Wang’s data 
for generations 2, 3, 4, 5, 10, 20, 30, 40, 50, 100 
and 200. 


Figure 9: Histogram of growth rates for cells in 
generations 2 (a), 5 (b) and 8 (c), Stewart data 
set. 


For Wang’s data set, one can be a bit more 
precise regarding stationarity for the cumulated 
old pole lineage. We implemented the following 
procedure (on old pole cells only): 
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Figure 10: Box plots of growth rates for cells 
in generations 2, 3, 4, 5, 10, 20, 30, 40, 50, 100 
and 200, Wang data set. Outliers (growth rate 
negative or larger than 0.08) are excluded for 
clarity. 

1. For each tree 

(a) The residuals of an ARMA(1,1) 
model are computed. 

(b) These residuals are split first half / 
second half 

(c) A Kolmogorov test (R command 
ks.test) is used for comparison of 
distributions of the subseries. 

2. We plot in Figure [H an histogram of the 
p-values. 



0.0 0.2 0.4 0.6 0.8 1.0 


Figure 11: P-values for the Kolmogorov test of 
stationarity, Wang data set. 


Lets us explain our motivation for the first step. 
In order to use the right threshold in the Kol¬ 
mogorov test, we need in theory the data to be 
independent. Assuming that the growth rate is 
an AR(1) process, and that the data are noisy 
observations of the growth rate, we get indeed 


an ARMA(1,1) process. Steps 2 and 3 are stan¬ 
dard. Concerning step 4, under Hq (stationar¬ 
ity) , the p-values are uniformly distributed, and 
else, they are more concentrated around 0. 

We see on Figure El a uniform distribution of 
the p-values, which is characteristic of the non¬ 
significance of the hypothesis of different distri¬ 
butions. We obtain the same conclusion if we 
replace the Kolmogorov test with a Student test 
(change ks.test into t.test). 

4 Conclusion 

In these two data sets we made efforts to take 
into account the tree structure of the data. We 
tried different statistical procedures that can be 
summed up as follows. 

Wang data. Because of the simple structure 
of this data set, each tree is here just the grey 
subtree in Figure[2 We have tried dynamical 
models in which the growth rate of a cell may 
have a multi-generation memory, with coeffi¬ 
cients possibly dependent on the tree (mixed 
effects). We did not find a significant improve¬ 
ment over the simplest model where the rate of 
a cell depends only on the one of its mother, 
and that of the grand mother has no significant 
influence. We found that 

1. the old pole cell growth rate is signifi¬ 
cantly more correlated to its mother than 
the new pole cell growth rate; 

2. the mean old pole cell growth rate is sig¬ 
nificantly smaller than the mean new pole 
cell growth rate; 

3. the stationarity cannot be rejected. 


Stewart data. The tree structure induces de¬ 
pendency in the data which we have take into 
account in our testing procedures. It is estab¬ 
lished in the literature that old pole and new 
pole cells have significantly different growth 
rates on this data set. In addition, we found 
the following. 

1. There is no stationarity of the growth rate 
across generations. This means that the 
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initial stress of the experiment has not the 
time to vanish during only the first 9 gen¬ 
erations. 

2. The most relevant factor is the the num¬ 
ber of generations since the last change 
of pole type, and not the whole sequence 
of types along the lineage of a given cell. 
For example, cell 17 (NNO) in figure 
has a similar growth rate as cells 21 and 
29 (ONO), or NONOONN (300, 428) as 
NNONN (68, 100). 

To conclude, in both data sets, we recover a 
statistically significant difference between the 
growth rate of sister cells. Therefore, asymme¬ 
try is present in the division of the E. coli, even 
after hundreds of generations. 

The apparent conflict between both data sets 
may simply come from observations at differ¬ 
ent phases: Stewart’s data are still in a tran¬ 
sient phase whereas Wang’s data are stationary. 
From this point of view, the two data sets are 
not contradictory. To our best knowledge, there 
is no available data set of E. coli division with 
both transient and steady states. It would be 
interesting to design an experiment where both 
the transient and the stationary phase could be 
observed on the same colonies. 
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